Question type-sensitive answer summarization

ABSTRACT

In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.

BACKGROUND

There are many different types of techniques for discoveringinformation, using a computer network. One specific technique isreferred to as a community-based question and answering service(referred to as cQA). The cQA service is a kind of web service throughwhich people can post questions and also answer other people's questionson a web site. The growth of cQA has been relatively significant, and ithas recently been offered by commercially available web search engines.

In current cQA services, a community of users either subscribes to theservice, or simply accesses the service through a network. The users inthe community can then post questions that are viewable by other usersin the community. The community users can also post answers to questionsthat were previously submitted by other users. Therefore, over time, cQAservices build up very large archives of previous questions and answersposted for those previous questions.

In some current cQA systems, once the user who has asked a question andhas received a sufficient number of answers, the user can then choosethe best answer from those answers that have been received. The user mayalso simply leave it to the community to vote for the best answer. Thisresults in large question and answer archives, representing knowledgerepositories, that can be used later.

For instance, these repositories can be used to expedite answers to theuser. In other words, there is typically a lag time between the timewhen a user in the community posts a question, and the time when otherusers of the community post answers to that question. In order to avoidthis lag time, some cQA services automatically search the archive ofquestions and answers to see if the same question has previously beenasked. If the question is found in the archives, then one or moreprevious answers can be provided, in answer to the current question,with very little delay. This type of searching for previous answers issometimes referred to as “question search” or “on-line search”.

The discussion above is merely provided for general backgroundinformation and is not intended to be used as an aid in determining thescope of the claimed subject matter.

SUMMARY

In a question answering system, the system identifies a type of questioninput by a user. The system then generates answer summaries thatsummarize answers to the input question in a format that is determinedbased on the type of question asked by the user. The answer summariesare output, in the corresponding format, in answer to the inputquestion.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter. The claimed subject matter is not limited to implementationsthat solve any or all disadvantages noted in the background.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating one embodiment of an answersummarization system.

FIG. 2 is a flow diagram illustrating the overall operation of thesystem shown in FIG. 1.

FIGS. 2A and 2B illustrate question and answer taxonomies, respectively,that are used, in one embodiment, in order to identify question type.

FIG. 3 is a block diagram of one illustrative embodiment of a questionanswering system.

FIG. 4 is a block diagram illustrating one embodiment of the overalloperation of the system shown in FIG. 3.

FIG. 5 is a block diagram of one embodiment of a computing environment.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of one embodiment of a question summarizationsystem 100. System 100 includes question-focused summarization component102 that, itself, includes question type identifier 104 and answersummarization component 106. Component 102 is shown having access toquestion/answer data store 108. Question/answer data store 108 isillustratively an archive of questions input in a community questionanswering system, along with the answers posted for those questions.Component 102 accesses the questions and answers in data store 108 andsummarizes the answers based on the question type, and outputs thequestions with summarized answers to data store 110.

FIG. 2 is a flow diagram illustrating one embodiment of the overalloperation of the system shown in FIG. 1. Component 102 first receives,or accesses, the questions and answers in data store 108. This isindicated by block 120 in FIG. 2. Next, question type identifier 104 incomponent 102 identifies the type of a selected question. This isindicated by block 122 in FIG. 2.

Answer summarization component 106 then generates answer summaries, in aform corresponding to the type of the question identified. This isindicated by block 124 in FIG. 2. In doing so, component 106 attempts tosummarize the answers that were identified by users as the best answersto the question being processed.

In analyzing the types of questions asked in cQA services, it has beenfound that some types of questions have best answers that are reusablein later searching, while others do not. For instance, questions thatask for answers that are facts may likely have a best answer that isextremely useful in later searching. However, cQA questions that ask foran opinion may not be helpful in later information searching.

For example, a fact-based, closed ended question, such as “What was thebirth date of Abraham Lincoln?” will have one unique answer. Therefore,an answer that is labeled in a cQA system as the best answer to thatquestion will likely be reusable and very relevant in later informationsearching. However, a question such as “What actress has the mostseductive voice?” asks for an opinion. Therefore, even the answerlabeled by the community as the best answer is still only an opinion.Thus, those answers may not be as reusable in later informationsearching processes.

Similarly, even fact-based open-ended questions may not be as reusableas a closed-ended question. For instance, a question such as “Doesanyone know of any movie stars having birthdays coming up soon?” is openended. First of all, the time period of the question is relevant becauseit asks for birthdays that are “coming up soon”. Similarly, there may bea wide variety of different people answering the question, and so theymay provided a wide variety of different birthdays for different moviestars, depending on the type of movie that they enjoy. Thus, answers tothis type of fact-based open-ended question, even if they are labeled as“best answers” may be less reusable or less relevant to laterinformation searching processes.

Therefore, in order to generate summaries of answers to cQA questions,question type identifier 104 identifies the question type, and answersummarization component 106 identifies the types of answers to theidentified question, and generates a summary of the best answers (andoptionally other answers as well) in a format that is determined basedon the type of the question identified.

FIG. 2A shows a taxonomy 200 that can be used in identifying a type ofanswer to a question. Taxonomy 200 is based on whether the best answercan be reused when a question that is similar to the best answer'squestion is asked again. Answer type taxonomy 200 first divides bestanswers 202 to cQA questions into two categories. Those categories arereusable 204 and not reusable 206. A reusable best answer means that itcan be reused as the answer if a similar question, to the question thatit answered, is asked again. A not reusable best answer means that itcannot be reused.

The reusable best answers are further divided into factual best answers208 and subjective best answers 210. A factual best answer is a factthat can be used as the best answer, while a subjective best answer isan opinion type of answer that can be used as the best answer.

The factual best answers 208 can be divided into two further types,those being unique best answers 212 and non-unique best answers 214. Aunique best answer has only a single best answer to its question. Anon-unique best answer has alternative best answers.

These alternative (non-unique) best answers 214 can be divided intodirect answers 216 and indirect best answers 218. A direct answer isjust that, an answer that answers its question directly. An indirectanswer is an answer which answers the question through inference.

The non-reusable best answers 206 are divided into two types: relevantbest answers 220 and irrelevant best answers 222. Relevant best answers220 are not reusable as a best answer to its question, if that questionis asked at a later time, but they are relevant to the question. Anirrelevant best answer is not reusable as a best answer to its questionand it is irrelevant to its question.

Table 1 summarizes answer type taxonomy 200 and gives examples for eachanswer type.

TABLE 1 Summary of cQA Answer Taxonomy with Examples Type DescriptionExample Reusab Factua Unique BA is an accurate and complete Question:Which country has the largest population? answer for its question. BA:China Not Unique Direct BA is one of the best answers to Question: Howcan I recover data erased on my computer? its question. BA: There's aprogram I used to have called ‘ABC Utilitities 2006’, I think you canrecover lost files with it. It's not free but you can download a free 30day trial http://www.ABC.com/ Hope this works for you A: ABC'sGetDataBack . . . A: http://www.undelete.com/file-recovery.as . . . GodBless A: . . . use SYSTEM RESTORE . . . Indirect BA is the best answerto the Question: Where can i download duckwire from because inferredgoal of its question. every website i go on wants to charge me? Thereare also user responses A: Go to duckwire.com and download the basicversion, answering its question directly. which is free. BA: you may aswell just avoid all the confusion and get chickwire then:http://www.duckwire.com/ it's identical to duckwire, but open source soit'll always be free Subjective BA is one of the best subjectiveQuestion: What is the best sci-fi movie u ever saw? answers for itsquestion. Other A: XYZ Movie subjective answers might be A: DEF Moviechosen as the best answer if BA: GHI Movie askers are different. NotReusable Relevant BA is relevant to its question Question: Why was “TheABC Song” by John Doe so but it cannot be reused. shortlived? BA: I'mnot sure where you live, but in NJ, especially South Jersey, that songwas played out. It may have made the New Year's “We're never playingthis song again” list. Irrelevant BA is unrelated to the questionQuestion: How to forward e-mail without the email itself. addresses inthe To Box? BA: It appears that the question period has expire. If ananswer has been given that meets your needs, please pick a ‘bestanswer’. Regards. A: put those e-mails that you don't want no one to seein the bcc box and then send BA's question is nonsense so is Question:*oink* Whose gunna partay with me? its anwers. BA: I'll party w/you I'mold enough to drink but it's over rated. I'll bring the cheese in thecan HAPPY NEW YEAR

Table 1 first shows a reusable, factual, unique best answer. This bestanswer type provides enough useful information in the answer and noother answers add more information. For instance, if a question asks“Which country has the largest population?” and the best answer is“China”. This answer is accurate and complete, so it can be categorizedas the unique type.

The difference between direct and indirect answer types is a bit subtle.A direct best answer answers its question in a straightforward manner;while an Indirect best answer answers its question in an inferredmanner. Answerers of Indirect best answers usually look beyond thesurface level of the question and try to give answers that satisfy theinferred needs behind question. For example, a question asked “Where cani download fatwire from because every website i go on wants to chargeme?” and the best answer was “http://www.duckwire.com/”. This answeraddresses the ‘free’ and ‘download’ parts of the question. However,instead of giving a FatWire download site, the BA suggested analternative, i.e. DuckWire, that presumably would fulfill what the askerreally wanted. Notice that a possible Direct BA suggested that downloadthe basic version of FatWire from Fatwire.com that is free. AssumeFatWire is a popular peer to peer (P2P) file sharing tool, and that ithas two versions. The Basic version is free but the Pro version is notfree. It can be inferred from the question that the asker wanted the Proversion and not the Basic. A true Direct BA is a site where the askercould download the Pro version without paying but it may very likely bean illegal site even if it does exist. Therefore the DuckWire suggestionis likely a very good alternative. Another example of Direct vs.Indirect BA is shown earlier in Table 1. The question asked celebritybirthdays in the coming month (November). A Direct BA contains a list ofcelebrity birthdays in November as shown in one of the answers inTable 1. A better alternative, i.e. an Indirect one, showed the askerwhere she can find this information herself not only for November butfor all other months.

A Subjective BA answers questions that look for personal opinions orrecommendations. For example, a question asked “What is the best sci-fimovie u ever saw?” Each answerer would have his or her own opinion aboutwhat the best sci-fi movie is. There might be consensus but it willlikely not be unique. Many answers in the Entertainment & Music categoryare this type.

Answers that fall into the Not Reusable category could be relevant orirrelevant to their questions. Despite their relevance, they could notbe used as the BAs. For example, a question asked “Why was “The ABCSong” by John Doe so shortlived?” A Relevant BA said “I'm not sure whereyou live, but in NJ, especially South Jersey, that song was played out.It may have made the New Year's “We're never playing this song again”list.” This BA seems relevant to the question but we could not be sureit is the BA without knowing where the asker resided.

There are many possible reasons that make a BA irrelevant. Two exemplarypossible explanations are listed in Table 1. One is that the BA does notanswer its question. For example, a question asked for how to forward anemail without showing the email addresses in the To box. There is ananswer that contains the correct method “put the email address into thebcc box”, but the best answer voted by one community is: “It appearsthat the question period has expired. If an answer has been given thatmeets your needs, please pick a ‘best answer’.” Another one is that theBA's question is nonsense and therefore its answers are as well. Forexample, someone asked “*oink* Whose gunna partay with me?” Though a BAwas chosen, it does not make sense to reuse the BA.

FIG. 2B shows a question taxonomy 300 that can be used to categorize cQAquestions and guide answer summarization. In taxonomy 300, a cQAquestion 302 can be divided into four top-level categories: navigational304, informational 306, transactional 308, and social 310. Anavigational question is one which intends to find the uniform resourcelocator (URL) of a web site that the questioner has in mind. Aninformational question intends to acquire information about a questiontopic. A transactional question intends to obtain or enlist interactionwith people in the cQA communities on which the question is posted.

The informational question 306 includes questions that requestinformation about topics, and can be divided into constant questions 312and dynamic questions 314. Constant questions have a fixed or stable setof answers, while dynamic questions do not.

The dynamic questions 314 can be further divided into three categories,those including opinion 316, context dependent questions 318 and openquestions 320. Opinion questions are those asking for opinions.Questions in this category seek opinions from people in the cQAcommunity about what they think of someone, some event, some object,etc. Context-dependent questions 318 are those questions which havedifferent answers, depending on the different context in which they areasked. For example, the question “What is the population of China?” willhave different answers according to the date on which it is asked. Openquestions 320 are those questions that ask for some facts or methods buthave a variety of answers, or the answer itself may have unconstraineddepth.

The transactional questions 308 can further be divided into possessivequestions 322 and interactive questions 324. Possessive questions 322indicate the askers intention to own the resource requests, regardlessof whether a computer is required to make use of the required resource.The interactive questions 324 are questions looking for entertainment orutility resources.

Social questions 310 do not intend to obtain any given answer. Thesequestions include telling jokes, and expressing the askers own ideas.For instance, one social question may be “Why do so many lazy peoplecome on here simply just to ask . . . ?” The question is actuallyexpressing a negative sentiment toward the large number of people whoask how to accomplish a certain task in the cQA community.

Table 2 illustrates question taxonomy 300, and gives examples of eachtype of question.

TABLE 2 Community Question Answering Service (cQA) Question Taxonomywith Examples Question Type Description Example Navigational Thequestion asks about URL information of The official website of the uslottery specific websites. program for immigration? Informational Thequestion asks for information about In a word ‘tetra-hadron’ whatexactly something. ‘a-tetra’ denotes? Constant The question requires noadditional What is the best sci-fi movie u ever information and hasfixed answers. saw? Dynamic The question has wide ranges of answers. Whowon XYZ Gameshow? Opinion The question seeks people's opinions of Canyou list some birthdays of stars something. in the coming week? Context-The question needs additional context dependent information to limit therange of answers. Open The question has a variety of answers or theanswer can have unconstrained depth. Transactional The question intendsto get some Is there a computer program that lets resources. you createa planet? Possessive The question intends to find some files InstallingABC Software - serial that are downloadable or something that number?can be read off the screen or printed. A good weather website?Interactive The question intends to find some websites that one can logon it and do something. Social This question is just chatting, joking,or Ever been on a boat ride? expressing one's own idea and doesn't mindTwo men who are sitting in a pub? to get an answer.

Identifying the question type and answer type as set out in FIG. 2 canbe done automatically, or simply by asking the user to select (such asfrom a drop down box) the type of question or answer that is beingprovided.

In using these taxonomies and analyzing large archives of cQA responses,it has been found that a number of the different question types haveanswers, less than 50 percent of which can be reused in later searchinggiven a similar type of question. Therefore, in one embodiment, answersummarization component 106 uses question-focused multi-documentsummarization techniques to generate the best answers for reuse inanswering questions that are submitted again, later, in the cQA service.

Generating summaries is discussed in greater detail below with respectto Tables 3-8. In any case, however, once the summaries are generated,based on the type of question that the answers are provided for, thequestions and answer summaries (which may be best answer summaries) areoutput for use in other information searching systems (such as inquestion answer systems). In that instance, the questions withsummarized answers can be stored in data store 110 shown in FIG. 1. Thisis indicated by block 126 in FIG. 2.

Some illustrative embodiments of summarizing answers, based on questiontype, will now be described. In doing this, answer summarizer component106 illustratively invokes one of a plurality of different summarizationmodules 1-N, one of which exists for each different question type.Examines of some illustrative summarization modules are shown in tables3, 5, and 6. The present discussion proceeds with respect to applyingquestion-focused multi-document summarization techniques to createanswer summaries for open and opinion questions (questions 316 and 320)as shown in FIG. 2B. Of course, these techniques can be used with otherquestion types as well, and the open and opinion questions are discussedherein for the sake of example only.

In one embodiment, answer summarization component 106 employs thesummarization steps set out in Table 3 in order to summarize open-typequestions 320. The steps in Table 3 follow multi-document summarizationprocedures which include topic identification, interpretation andfusion, and then summary generation. In the topic identification phase,answers are clustered using any of a wide variety of relatively simpleclustering techniques. For instance, cosine measure, which is a knownsimilarity measure, can be used to estimate answer similarity and anempirically determined threshold can be used to merge answers intoclusters. This is indicated in step 1 of Table 3.

TABLE 3 Summarization Algorithm of Open-type Questions 1. Employ theclustering algorithm on answers 2. Extract the noun phrases in everycluster by using a shallow parser. 3. For each cluster and each label(or noun phrase), calculate the score by using the Relevance ScoringFunction:${\sum\limits_{w}\; {{p\left( w \middle| \theta \right)}{{PMI}\left( {w,\left. l \middle| C \right.} \right)}}} - {D\left( \theta \middle| C \right)}$Where θ is the cluster, w is the word, l is the label or noun phrase, Cis the background context which is composed of 5,000 questions in thesame category, p(·) is conditional probability, PMI(·) pointwise mutualinformation, and D(·) is KL-divergence 4. Extract the key answer whichcontains the noun phrase that has the highest score in each cluster 5.Rank these key answers by the cluster size and present the results.

Next, in order to understand the main idea of each cluster, salient nounphrases can be extracted using any of a wide variety of shallow parsingcomponents. This is indicated in step 2 of Table 3. Each label (or nounphrase) in each cluster can then have a relevance score calculated forit, such as by using a known first-order relevance scoring function alsosometimes referred to as automatic labeling of multi-nomial topicmodels. This is set out in step 3 in Table 3. The relevance function canbe used to extract a key answer that contains the noun phrase that hasthe highest score (calculated in step 3) in each cluster, and the keyanswers can also be ranked, within each cluster, based on the relevancescore. This is indicated by step 4 in Table 3.

Finally, the summarized answer can be presented to the user, byextracting the most important answer in each cluster, and sorting thosemost important answers according to the cluster size that spawned theanswers. This is indicated by step 5 in Table 3.

Table 4 illustrates an example of summarization of results of anopen-type question. The question asked methods of changing a first styleof operating system to a second style. There are many kinds of softwarethat have this functionality. The best answer to the question onlylisted a single choice-The ACMEDOCK product while other answerssuggested ABC and DEF and GHI products. The summarized answer contains avariety of software that have this functionality and a web page linkthat lists all kinds of software for turning the first style ofoperating system into the second style.

TABLE 4 Summarization Result of Open-Type questions Question(http://answers.alpha.com/question/?qid=1005120801427) What is the bestway to make 123 look like Mac osX? Best Answer Chosen I found the bestway to do this is to use BetaBlinds. A program that, if you use thetotal AMCEDock, package will allow you to add the ObjectBar in additionto changed the toolbars to be OPSX stylized. If you want addedfunctionality you can download programs off the internet that will mimicthe Expose feature which will show you a tiled set of all open windows.Programs that will do this include: ABC, DEF, and GHI Auto-summarizedAnswer An additional option is JKL - a “Shell Replacement” for ABC thathas a variety of themes you can install. Undoubtedly there are variousOPSX themes available for JKL. I have included a source to a max OPSXtheme for JKL at customize.org. DOE is a transformation pack and themost comprehensive in terms of converting an XP system's look to that ofan OPSX system, google it up and you should find it, v3 seems to be indevelopment and should be out soon. Betahttp://www.ACMEdock.com/products/Beta . . . Manually-summarized AnswerThere are multiple ways of solution. One is to use Beta. The packagewill allow you to add the ObjectBar for changing to the OPSX theme. Youcan also make added functionality of Expose feature by downloading theprograms like ABC, DEF and GHI. The download URL of it ishttp://www.ACMEdock.com/products/beta/. Another option is to use DOEwhich is a transformation pack. The third Option is the JKL, it is a“Shell Replacement” for windows that has a variety of OPSX tehmes youcan install. The download url is http://JKL.net and I have included asource of OPSX theme for JKL at http://www.customize.org/details/33409.To have a complete study of the methods of converting 123 style to OPSXstyle, you may also refer tohttp://features.JKEL.com/entry/8828351836181248/ for detail information.

Table 5 shows a question-focused answer summarization technique foropinion type questions. Opinion type questions can be divided into twosubcategories. One is sentiment-oriented questions that ask thesentiment about something. For instance, “What do you think of . . . ”is a sentiment type question. The other type of opinion question is alist-oriented question that requests a list of answers and looks todetermine which item is most popular. In sentiment-oriented questions,the askers likely care about how many people support or opposesomething. Table 5 shows steps for summarizing answers to sentiment-typequestions. Table 6 shows steps for summarizing list-type questions.Tables 7 and 8 show summaries generated for sentiment-type questions andlist-type questions, respectively. Of course, the methods in Tables 5and 6 and the summaries in Tables 7 and 8 are exemplary only.

TABLE 5 Summarization Algorithm of Sentiment Questions For eachanswer 1. If an answer begins with negative cue phrase (e.g. “No, itisn't” etc.), it is annotated as Against. If a response begins withpositive cue phrase (e.g. “Yes, it is” etc.), it is annotated asSupport. 2. For a clause, if number of positive sentiment word is largerthan negative sentiment word, the sentiment of the clause is Positive.Otherwise, the sentiment of the clause is Negative. 3. If there arenegative indicators such as “don't/never/ . . . ” in front of theclause, the sentiment should be reversed. 4. If number of negativeclauses is larger than number of positive clauses, the sentiment of theanswer is Negative. Otherwise, the sentiment of the answer is Positive.5. Denote the sentiment value of question as s(q), the sentiment valueof an answer as s(a), and then the final sentiment of the answer islogical AND of s(q) and s(a)

TABLE 6 Summarization Algorithm of List Questions 1. Segment the answersinto sentences 2. Cluster sentences 3. For each cluster, Choose the keysentence based on mutual information between itself and other sentenceswithin the cluster 4. Rank the key sentences by the cluster size

TABLE 7 Summary of Sentiment-Opinion Question Question(http://answers.Alpha.com/question/?qid=1006050125145) I am 16 and istil talk to my erm..teddy bear . . . am i wierd??? Best Answer Chosennot at all i'm 14 and i too do that Auto-summarized Answer A: No, you'renot weird . . . you're Pratheek! :D (Against) A: no, i like to hold onto my old memories too. i do it sometimes too. (Against) A: It's mightbe a little uncommon for a 16 year old to talk to a teddy bear but therewould be a serious problem if you told me that your teddy bear answeredback as you talked to him!!:) (Support) A: You still has a teddy bear,wow that really uncommon for a boy your age to have a teddy bear. But IGUESS IT GOOD because you talk to it. (Support) A: Yes. (Support) A: italk to a seed im growing .. its not weird . . . :) (Support) A: It willget weird when he starts to answer back! (Against) A: not really. itdepends how you talk i mean not if you talk to it like its a little kidlike my brother does. (Against) A: I slept with my teddy bear until Igraduated. Can't say that I ever had a conversation with him, but if Ihad I'm sure he would've been a very good listener. (Support) A: You maypass GO . . . (Neutral) A: not at all i'm 14 and i too do that (Aganist)Overall Attitude: Support 5 Votes Neutral 1 Votes Against 5 VotesManually-summarized Answer against (vote 5) reasons: i like to hold onto my old memories too. (vote 1) I slept with my teddy bear until Igraduated. (vote 1) i'm 14 and i too do that (vote 1) support (vote 4)neuter (vote 2)

TABLE 8 Summary of List-Opinion Question Question(http://answers.ABC.com/question/?qid=20060718083151AACYQJn) What is thebest sci-fi movie u ever saw? Best Answer Chosen Independance DayAuto-summarized Answer DEF Movie (5) GHI Movie (3) fi movie has to beJKL Movie (2) But the best “B” sci (2) I liked MNO Movie it didn't scareme and I though they did a great job receating Egypt (3) PQR Movie (3)Manually-summarized Answer DEF Movie (vote 6); GHI Movie (vote 3); JKLMovie (vote 2); MNO Movie (vote 2); PQR Movie (vote 2); STU Movie (vote2); VWX Movie v.s XYZ Movie (vote 1); ABC123 Movie (vote 1); DEF456Movie(vote 1); GHI789 Movie(vote 1); JKL123 Movie (vote 1); MNO456 Movie(vote 1); PQR789 Movie (vote 1); STU123 Movie (vote 1); VWX456 Movie(vote 1)

As shown in Table 5, Answers are summarized by first determining whetherthey begin with a negative queue phrase or a positive queue phrase. Thiscan be done simply by using an opinion-word dictionary that is adictionary of opinion words, and a queue phrase list that lists queuephrases into positive and negative queue phrases. If the answer beingswith a negative queue phrase, it is annotated as being “against” what isproposed, and if it begins with a positive queue phrase it is annotatedas “supporting” what is proposed.

If the answer cannot be annotated in step 1, then, for a clause, answersummarizer 106 determines whether a number of positive sentiment wordsin the clause is larger than the number negative sentiment words. If so,the clause is annotated “positive” and if not, it is annotated“negative”.

In step 3, if there are negative indicators such as “don't”, “never”,etc. in front of the clause, then the sentiment is reversed.

In step 4, if the number of negative clauses is larger than the numberof positive clauses in the answer, then the sentiment of the answer is“negative”. Otherwise, it is “positive”.

Finally, in step 5 of Table 5, the sentiment value of the question (isdenoted as S(Q)), and the sentiment value of the answer is denoted as(S(A)). The final sentiment of the answer is the logical Boolean AND ofS(Q) and S(A).

In Table 6, the steps for summarizing answers to list questions includefirst segmenting the answers into sentences and clustering the sentencesusing the clustering technique set out above in Table 3. This isindicated by steps 1 and 2 in Table 6.

Next, for each cluster, a key sentence based on mutual informationbetween itself and other sentences within the cluster is chosen. The keysentences are then ranked by cluster size that spawned the keysentences. This is indicated by steps 3 and 4 in Table 6.

For the list-oriented questions, it can be seen that a simple countingalgorithm tallies different answers of questions, together with theirsupporting votes, to obtain the answer summaries.

It can be seen that Table 7 presents the summarization result of asentiment-oriented question which asks whether it is strange for a 16year old child to talk to a teddy bear. The best answer is a negativeresponse. However, if all answers are considered, it can be seen thathalf of the answers agree, but another half of them disagree. This tallyis thus presented in the answer summarization.

Table 8 shows the summarization result of a list-oriented question whichasks for users to identify the best science fiction movie they have everseen. The best answer just gives a single choice, while the summarizedanswer gives a list of best science fiction movies with the number ofsupporting votes.

FIG. 3 is a block diagram illustrating a runtime question answeringsystem 350 which can be used to present answers, to previously askedquestions, to a user who submits a current question. FIG. 4 is a flowdiagram illustrating one embodiment of the overall operation of thesystem shown in FIG. 3.

Question search component 354 first receives an input question 352 froma user. This is indicated by block 380 in FIG. 4. Component 354 thenidentifies, within data store 110 (that contains questions withsummarized answers) whether any questions in data store 110 are similarto question 352. This is indicated by block 382 in FIG. 4. In doing so,component 354 can use any of a wide variety of different techniques thatare used to identify similarity between two textual inputs. Forinstance, component 354 can identify the question type and keywords inthe question and compare that with the question type and keywords(content) in data store 110. Of course, other more sophisticated naturallanguage processing techniques can be used as well. For instance,component 354 can convert the input question 352 into a syntax parsetree or logical form (content), and that can be compared to syntax parsetrees or logical forms generated for the questions in data store 110. Awide variety of other processing techniques can be used as well, inorder to identify questions in data store 110 that are similar to inputquestion 352.

In any case, once the similar questions are identified in data store110, the answers to those questions are retrieved from data store 110.This is indicated by block 384 in FIG. 4. Component 354 then outputs thequestion and answer summaries 356, for review by the user. This isindicated by block 386 in FIG. 4. Of course, component 354 can alsoallow the user to interact with the summarized answers, such as toidentify a best answer among them, or allow the community to define thebest answer among them, etc.

It should also be noted that question search component 354 can be anytype of information search component, such as an information retrievalsystem. In that type of system, data store 110 may be a valuablerepository of knowledge that can be searched in response to aninformation retrieval query. Therefore, system 354 is not limited to acommunity question answering system, but can be any other type ofinformation seeking system as well, that accesses the data stored instore 110 based on community based question answering services.

With reference to FIG. 5, an exemplary system for implementing someembodiments includes a general-purpose computing device in the form of acomputer 510. Components of computer 510 may include, but are notlimited to, a processing unit 520, a system memory 530, and a system bus521 that couples various system components including the system memoryto the processing unit 520. The system bus 521 may be any of severaltypes of bus structures including a memory bus or memory controller, aperipheral bus, and a local bus using any of a variety of busarchitectures. By way of example, and not limitation, such architecturesinclude Industry Standard Architecture (ISA) bus, Micro ChannelArchitecture (MCA) bus, Enhanced ISA (EISA) bus, Video ElectronicsStandards Association (VESA) local bus, and Peripheral ComponentInterconnect (PCI) bus also known as Mezzanine bus.

Computer 510 typically includes a variety of computer readable media.Computer readable media can be any available media that can be accessedby computer 510 and includes both volatile and nonvolatile media,removable and non-removable media. By way of example, and notlimitation, computer readable media may comprise computer storage mediaand communication media. Computer storage media includes both volatileand nonvolatile, removable and non-removable media implemented in anymethod or technology for storage of information such as computerreadable instructions, data structures, program modules or other data.Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical disk storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can be accessed by computer 510. Communication media typicallyembodies computer readable instructions, data structures, programmodules or other data in a modulated data signal such as a carrier waveor other transport mechanism and includes any information deliverymedia. The term “modulated data signal” means a signal that has one ormore of its characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Combinations of any of the aboveshould also be included within the scope of computer readable media.

The system memory 530 includes computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) 531and random access memory (RAM) 532. A basic input/output system 533(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 510, such as during start-up, istypically stored in ROM 531. RAM 532 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 520. By way of example, and notlimitation, FIG. 5 illustrates operating system 534, applicationprograms 535, other program modules 536, and program data 537. Thesystems discussed above in FIGS. 1-4 can be stored in other programmodules 536 or elsewhere, including being stored remotely.

The computer 510 may also include other removable/non-removablevolatile/nonvolatile computer storage media. By way of example only,FIG. 5 illustrates a hard disk drive 541 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 551that reads from or writes to a removable, nonvolatile magnetic disk 552,and an optical disk drive 555 that reads from or writes to a removable,nonvolatile optical disk 556 such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment include,but are not limited to, magnetic tape cassettes, flash memory cards,digital versatile disks, digital video tape, solid state RAM, solidstate ROM, and the like. The hard disk drive 541 is typically connectedto the system bus 521 through a non-removable memory interface such asinterface 540, and magnetic disk drive 551 and optical disk drive 555are typically connected to the system bus 521 by a removable memoryinterface, such as interface 550.

The drives and their associated computer storage media discussed aboveand illustrated in FIG. 5, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 510. In FIG. 5, for example, hard disk drive 541 is illustratedas storing operating system 544, application programs 545, other programmodules 546, and program data 547. Note that these components can eitherbe the same as or different from operating system 534, applicationprograms 535, other program modules 536, and program data 537. Operatingsystem 544, application programs 545, other program modules 546, andprogram data 547 are given different numbers here to illustrate that, ata minimum, they are different copies.

FIG. 5 shows the clustering system in other program modules 546. Itshould be noted, however, that it can reside elsewhere, including on aremote computer, or at other places.

A user may enter commands and information into the computer 510 throughinput devices such as a keyboard 562, a microphone 563, and a pointingdevice 561, such as a mouse, trackball or touch pad. Other input devices(not shown) may include a joystick, game pad, satellite dish, scanner,or the like. These and other input devices are often connected to theprocessing unit 520 through a user input interface 560 that is coupledto the system bus, but may be connected by other interface and busstructures, such as a parallel port, game port or a universal serial bus(USB). A monitor 591 or other type of display device is also connectedto the system bus 521 via an interface, such as a video interface 590.In addition to the monitor, computers may also include other peripheraloutput devices such as speakers 597 and printer 596, which may beconnected through an output peripheral interface 595.

The computer 510 is operated in a networked environment using logicalconnections to one or more remote computers, such as a remote computer580. The remote computer 580 may be a personal computer, a hand-helddevice, a server, a router, a network PC, a peer device or other commonnetwork node, and typically includes many or all of the elementsdescribed above relative to the computer 510. The logical connectionsdepicted in FIG. 5 include a local area network (LAN) 571 and a widearea network (WAN) 573, but may also include other networks. Suchnetworking environments are commonplace in offices, enterprise-widecomputer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 510 is connectedto the LAN 571 through a network interface or adapter 570. When used ina WAN networking environment, the computer 510 typically includes amodem 572 or other means for establishing communications over the WAN573, such as the Internet. The modem 572, which may be internal orexternal, may be connected to the system bus 521 via the user inputinterface 560, or other appropriate mechanism. In a networkedenvironment, program modules depicted relative to the computer 510, orportions thereof, may be stored in the remote memory storage device. Byway of example, and not limitation, FIG. 5 illustrates remoteapplication programs 585 as residing on remote computer 580. It will beappreciated that the network connections shown are exemplary and othermeans of establishing a communications link between the computers may beused.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

1. A system for summarizing answers to previously asked questions, thesystem comprising: a question type identifier component configured toaccess a question data store of previously asked questions along withanswers submitted to the previously asked questions and select aquestion for answer summarization and to identify a question type of theselected question; an answer summarization component configured togenerate answer summaries that summarize the answers to the selectedquestion, the answer summaries being in a format that is determined bythe question type of the selected question.
 2. The system of claim 1wherein the answer summarization component selects one of a plurality ofdifferent question summarization algorithms to generate the answersummaries for the answers to the selected question based on the questiontype.
 3. The system of claim 1 and further comprising: a summary datastore, wherein the answer summarization component is configured to storethe selected question and the answer summaries generated for theselected question in the summary data store.
 4. The system of claim 1wherein there are a plurality of answers to the selected question in thequestion data store.
 5. The system of claim 4 wherein the plurality ofanswers to the selected question include a subset of answers identifiedas best answers to the selected question, and wherein the answersummarization component identifies the best answers in the answersummaries.
 6. The system of claim 5 wherein the answer summarizationcomponent generates the answer summaries from the plurality of answers,including the subset of best answers to the selected question.
 7. Thesystem of claim 5 wherein the question data store comprises questionsand answers submitted in a community based question answer systemwherein the best answers are identified by one of the community and anasker of the selected question.
 8. The system of claim 1 and furthercomprising: a question search component configured to receive an inputquestion and identify one or more similar questions in the summary datastore and to output, in response to the input question, the answersummaries for the one or more similar questions.
 9. The system of claim8 wherein the question search component is configured to identify theone or more similar questions as having a similar question type to aquestion type of the input question and having similar content tocontent of the input question.
 10. The system of claim 1 wherein thequestion type identifier is configured to identify question type bymapping the selected question to a question type taxonomy dividing thepreviously asked questions into different types.
 11. A computerimplemented method of answering an input question, comprising: accessinga plurality of previously asked questions along with answers previouslysubmitted for the previously asked question; selecting one of thepreviously asked questions; identifying a question type of the selectedquestion, the question type being one of a plurality of differentquestion types, the plurality of different question types beingindependent of a content of a desired answer; selecting an answersummarization component to generate answer summaries for the answers tothe selected question based on the question type identified, and storingthe answer summaries; receiving the input question; comparing the inputquestion to previously asked questions for which answer summaries havebeen generated; and outputting, in response to the input question, theanswer summaries generated for previously asked questions that aresimilar to the input question.
 12. The method of claim 11 whereincomparing comprises: identifying a content of the input question andcontent of the previously asked questions and comparing the content ofthe input question to the content of the previously asked questions. 13.The method of claim 12 wherein comparing comprises: identifying aquestion type of the input question and comparing it with the questiontype identified for the previously asked questions.
 14. The method ofclaim 11 wherein selecting answer summarization component comprises:selecting one of a plurality of different answer summarizationcomponents, each answer summarization components generating answersummaries in a different format.
 15. The method of claim 11 wherein theanswers to the selected question have a subset of the answers identifiedas best answers and wherein the answer summaries identify the bestanswers.
 16. The method of claim 11 wherein identifying a question typecomprises: applying an automatic question type identifier algorithm tothe selected question to identify the question type.
 17. A questionanswering system, comprising: an answer summarization system configuredto generate answer summaries for answers to a plurality of storedquestions, the answer summaries having a format that is determined basedon a type of the question that elicited the answer; and a questionsearch component configured to receive an input question, identify atype of the input question, identify similar stored questions based onthe type of the input question, based on types of the stored questionsand based on content of the input question and the stored questions, andthe question search component further being configured to output theanswer summaries generated for the similar stored questions, in responseto the input question.
 18. The question answering system of claim 17wherein the answer summarization system comprises: a question typeidentifier component configured to receive a user change to a userinterface, the user change identifying a question type of the selectedquestion.
 19. The question answering system of claim 18 wherein theanswer summarization system comprises: an answer summarization componentconfigured to generate the answer summaries in the format determined bythe question type.
 20. The question answering system of claim 19 whereinthe answer summarization component comprises: a plurality of differentanswer summarization modules, each generating the answer summaries in adifferent format, one answer summarization module corresponding to eachdifferent question type.